Clustering of Defect Reports Using Graph Partitioning Algorithms
نویسندگان
چکیده
We present in this paper several solutions to the challenging task of clustering software defect reports. Clustering defect reports can be very useful for prioritizing the testing effort and to better understand the nature of software defects. Despite some challenges with the language used and semi-structured nature of defect reports, our experiments on data collected from the open source project Mozilla show extremely promising results for clustering software defect reports using natural language processing and graph partitioning techniques. We report results with three models for representing the textual information in the defect reports and three clustering algorithms: normalized cut, size regularized cut, and k-means. Our data collection method allowed us to quickly develop a proof-of-concept setup. Experiments showed that normalized cut achieved the best performance in terms of average cluster purity, accuracy, and normalized mutual information.
منابع مشابه
Assessment of the Performance of Clustering Algorithms in the Extraction of Similar Trajectories
In recent years, the tremendous and increasing growth of spatial trajectory data and the necessity of processing and extraction of useful information and meaningful patterns have led to the fact that many researchers have been attracted to the field of spatio-temporal trajectory clustering. The process and analysis of these trajectories have resulted in the extraction of useful information whic...
متن کاملSampling from social networks’s graph based on topological properties and bee colony algorithm
In recent years, the sampling problem in massive graphs of social networks has attracted much attention for fast analyzing a small and good sample instead of a huge network. Many algorithms have been proposed for sampling of social network’ graph. The purpose of these algorithms is to create a sample that is approximately similar to the original network’s graph in terms of properties such as de...
متن کاملA Hybrid Data Clustering Algorithm Using Modified Krill Herd Algorithm and K-MEANS
Data clustering is the process of partitioning a set of data objects into meaning clusters or groups. Due to the vast usage of clustering algorithms in many fields, a lot of research is still going on to find the best and efficient clustering algorithm. K-means is simple and easy to implement, but it suffers from initialization of cluster center and hence trapped in local optimum. In this paper...
متن کاملA Graph Based Clustering Method using a Hybrid Evolutionary Algorithm
Clustering of data items is one of the important applications of graph partitioning using a graph model. The pairwise similarities between all data items form the adjacency matrix of a weighted graph that contains all the necessary information for clustering. In this paper we propose a novel hybrid-evolutionary algorithm based on graph partitioning approach for data clustering. The algorithm is...
متن کاملWeighted Ensemble Clustering for Increasing the Accuracy of the Final Clustering
Clustering algorithms are highly dependent on different factors such as the number of clusters, the specific clustering algorithm, and the used distance measure. Inspired from ensemble classification, one approach to reduce the effect of these factors on the final clustering is ensemble clustering. Since weighting the base classifiers has been a successful idea in ensemble classification, in th...
متن کامل